Skip to content

Conversation

@oneonestar
Copy link
Member

@oneonestar oneonestar commented Oct 9, 2025

Description

Expose Trino backend state via JMX

Additional context and related issues

JMX looks like this:

# TYPE io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_QueuedQueryCount gauge
io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_QueuedQueryCount 0.0
# TYPE io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_TrinoStatusUnknown gauge
io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_TrinoStatusUnknown 0.0
# TYPE io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_TrinoStatusHealthy gauge
io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_TrinoStatusHealthy 1.0
# TYPE io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_RunningQueryCount gauge
io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_RunningQueryCount 0.0
# TYPE io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_TrinoStatusPending gauge
io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_TrinoStatusPending 0.0
# TYPE io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_TrinoStatusUnhealthy gauge
io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_TrinoStatusUnhealthy 0.0
# TYPE io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_NumWorkerNodes gauge
io_trino_gateway_ha_router_name_ClusterStats_cluster_id_trino1_NumWorkerNodes 0.0

I didn't implement unregister logic, because under the current implementation BackendStateManager.clusterStats won't delete old Trino cluster. We may need another PR to refactor BackendStateManager and *Observer classes to support this.

Release notes

(X) Release notes are required, with the following suggested text:

* Expose Trino cluster state via JMX.

Summary by Sourcery

Expose Trino backend state via JMX by exporting dynamic ClusterStatsJMX MBeans for each cluster and updating dependency injection to support MBeanExporter

New Features:

  • Expose Trino cluster state via JMX MBeans

Enhancements:

  • BackendStateManager now injects MBeanExporter and registers a ClusterStatsJMX MBean to expose running queries, queued queries, worker nodes, and status flags
  • Update Guice module to bind BackendStateManager as a singleton and support MBeanExporter injection

Tests:

  • Add integration test to verify JMX metrics appear on the /metrics endpoint

@cla-bot cla-bot bot added the cla-signed label Oct 9, 2025
@sourcery-ai
Copy link

sourcery-ai bot commented Oct 9, 2025

Reviewer's Guide

This PR integrates JMX exposure for Trino backend cluster state by injecting an MBeanExporter into BackendStateManager, defining a ClusterStatsJMX bean with Managed annotations and exporting/updating it in updateStates, updates the DI configuration to bind BackendStateManager via Guice, and adds an integration test to verify the new JMX metrics are exposed properly.

File-Level Changes

Change Details Files
Inject and utilize an MBeanExporter in BackendStateManager to export cluster stats via JMX
  • Inject MBeanExporter via constructor
  • Initialize and track ClusterStatsJMX instances per cluster
  • Call exporter.exportWithGeneratedName on first update and update existing beans
gateway-ha/src/main/java/io/trino/gateway/ha/router/BackendStateManager.java
Define a ClusterStatsJMX inner class with @Managed getters for cluster metrics
  • Add fields mirroring ClusterStats properties
  • Implement updateFrom to populate fields
  • Annotate getters for runningQueryCount, queuedQueryCount, numWorkerNodes and TrinoStatus flags
gateway-ha/src/main/java/io/trino/gateway/ha/router/BackendStateManager.java
Adjust Guice module to bind BackendStateManager as a singleton
  • Bind BackendStateManager in configure()
  • Remove manual instantiation field and @provides method
gateway-ha/src/main/java/io/trino/gateway/ha/module/HaGatewayProviderModule.java
Add integration test to verify JMX metrics exposure
  • Implement testClusterStatsJMX with HTTP call to /metrics
  • Assert presence of TrinoStatusHealthy entries for configured clusters
gateway-ha/src/test/java/io/trino/gateway/ha/TestGatewayHaMultipleBackend.java

Possibly linked issues

  • #Proposal to improve telemetry of Trino Gateway: The PR exposes Trino cluster health and status metrics via JMX, which directly implements the proposal to track cluster activation status for improved telemetry in the issue.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Replace the fixed Thread.sleep in testClusterStatsJMX with a polling loop and timeout to avoid flakiness in the test.
  • Consider using Map.computeIfAbsent for clusterStatsJMXs in updateStates to streamline the creation and registration logic.
  • Add unregister logic for MBeans when a cluster is removed or its stats are no longer tracked to prevent JMX resource leaks.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Replace the fixed Thread.sleep in testClusterStatsJMX with a polling loop and timeout to avoid flakiness in the test.
- Consider using Map.computeIfAbsent for clusterStatsJMXs in updateStates to streamline the creation and registration logic.
- Add unregister logic for MBeans when a cluster is removed or its stats are no longer tracked to prevent JMX resource leaks.

## Individual Comments

### Comment 1
<location> `gateway-ha/src/main/java/io/trino/gateway/ha/router/BackendStateManager.java:68-71` </location>
<code_context>
+
+    public static class ClusterStatsJMX
+    {
+        int runningQueryCount;
+        int queuedQueryCount;
+        int numWorkerNodes;
+        TrinoStatus trinoStatus;
+
+        public ClusterStatsJMX(ClusterStats clusterStats)
</code_context>

<issue_to_address>
**suggestion:** Fields in ClusterStatsJMX should be private for encapsulation.

Making these fields private will prevent external classes from modifying them and improve encapsulation.

```suggestion
        private int runningQueryCount;
        private int queuedQueryCount;
        private int numWorkerNodes;
        private TrinoStatus trinoStatus;
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@oneonestar oneonestar force-pushed the ypoon/backendstate_jmx branch from 8131821 to 54fe806 Compare October 9, 2025 09:33
@oneonestar oneonestar force-pushed the ypoon/backendstate_jmx branch from 54fe806 to 92b1151 Compare October 9, 2025 09:38
@ebyhr ebyhr merged commit 71c12d9 into trinodb:main Oct 10, 2025
3 checks passed
@github-actions github-actions bot added this to the 17 milestone Oct 10, 2025
@oneonestar oneonestar deleted the ypoon/backendstate_jmx branch October 10, 2025 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants